Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disk buffers #109

Merged
merged 4 commits into from
Sep 9, 2020
Merged

Disk buffers #109

merged 4 commits into from
Sep 9, 2020

Conversation

camdencheek
Copy link
Contributor

@camdencheek camdencheek commented Sep 3, 2020

Description of Changes

Adds a disk buffer, a new memory buffer, and a flusher.

Things that could use specific review effort (obviously in addition to anything you'd like to add):

  • Naming. Names for config options are hard, and I'd like your opinions on what I've got.
  • Trying to break it. I'd love if this was run on a few different systems other than mine with different workloads.
  • Thoughts on ergonomics. I split the existing buffer-flusher into a buffer and a flusher. This makes it more modular, and allows us to swap out our flusher for special cases (cabin might need to eventually), but it also makes building and starting them slightly more complex.
  • Ideas for test cases that I missed
  • Soundness concerns

Trying to at least somewhat limit the scope, there are still a couple of features left undone:

  • Configuring backoff. Right now, it has sane defaults, but I'd like to make a separate task of deciding what backoff params we want to expose.
  • Default disk buffer paths. Right now, it requires specifying a path manually, but ideally, we'd have a "log agent data dir" that it can default to. This feels like a separate feature that requires separate design, especially when it comes to how it interacts with the universal agent.
  • Configurable behavior "on full"

Please check that the PR fulfills these requirements

  • Tests for the changes have been added (for bug fixes / features)
  • Docs have been added / updated (for bug fixes / features)
  • Add a changelog entry (for non-trivial bug fixes / features)
  • CI passes

@djaglowski
Copy link
Member

Log Files Logs / Second CPU Avg (%) CPU Avg Δ (%) Memory Avg (MB) Memory Avg Δ (MB)
1 1000 2.2069383 -1.7069187 129.64372 +94.56816
1 5000 7.4310756 -1.9829268 134.68602 +89.655304
1 10000 15.172765 -0.8103008 144.37419 +88.362335
1 50000 79.868065 +7.7787704 226.3711 +9.071793
1 100000 144.62514 +1.068634 425.12527 +118.792694
10 100 2.6034954 -2.9828842 129.91621 +96.15652
10 500 8.569094 -3.3449135 138.60951 +96.021286
10 1000 16.48351 -2.6203403 145.4534 +90.07503
10 5000 82.80417 +5.930458 223.56506 +45.546616
10 10000 156.13608 +0.023147583 384.66406 +52.43521

@codecov
Copy link

codecov bot commented Sep 3, 2020

Codecov Report

Merging #109 into master will decrease coverage by 0.30%.
The diff coverage is 70.77%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #109      +/-   ##
==========================================
- Coverage   72.30%   72.00%   -0.30%     
==========================================
  Files          75       78       +3     
  Lines        4538     5050     +512     
==========================================
+ Hits         3281     3636     +355     
- Misses        973     1056      +83     
- Partials      284      358      +74     
Impacted Files Coverage Δ
operator/builtin/output/elastic.go 16.04% <0.00%> (-1.31%) ⬇️
operator/duration.go 70.27% <0.00%> (-6.20%) ⬇️
operator/helper/input.go 74.36% <0.00%> (-4.59%) ⬇️
operator/helper/parser.go 83.05% <0.00%> (-3.16%) ⬇️
operator/buffer/disk_metadata.go 42.86% <42.86%> (ø)
...perator/builtin/output/googlecloud/google_cloud.go 48.63% <46.67%> (+2.20%) ⬆️
commands/offsets.go 64.71% <50.00%> (-1.96%) ⬇️
entry/record_field.go 90.24% <66.67%> (ø)
entry/field.go 81.82% <71.43%> (+0.21%) ⬆️
operator/buffer/disk.go 76.47% <76.47%> (ø)
... and 13 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4062c97...e003d36. Read the comment docs.

Copy link
Member

@djaglowski djaglowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have only reviewed the interface & docs so far, but will circle back to the rest later today. Here's feedback on what I've reviewed so far though.

docs/operators/elastic_output.md Outdated Show resolved Hide resolved
docs/operators/google_cloud_output.md Outdated Show resolved Hide resolved
docs/types/buffer.md Outdated Show resolved Hide resolved
docs/types/buffer.md Show resolved Hide resolved
docs/types/buffer.md Show resolved Hide resolved

| Field | Default | Description |
| --- | --- | --- |
| `max_size` | `4294967296` (4GiB) | The maximum size of the disk buffer file in bytes |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although this is documented well here, it might be slightly clearer to call this max_bytes, since many users will only encounter this in a config file and would either have to make assumptions about the units, or dig up the docs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially thought the same thing, but I would love to implement something like this as a future feature. Would it still make sense to do max_bytes if the value is 8GB?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. I guess it comes down to how likely we are to implement that feature and how soon it would happen. If we're going to end up living with the current implementation for a while, then I think we should consider using the more explicit term now and deprecating it later. We could pretty easily support both and just require at most one of the two be specified.

docs/types/flusher.md Outdated Show resolved Hide resolved
docs/types/flusher.md Outdated Show resolved Hide resolved
@djaglowski
Copy link
Member

Log Files Logs / Second CPU Avg (%) CPU Avg Δ (%) Memory Avg (MB) Memory Avg Δ (MB)
1 1000 2.0517936 -1.8620634 128.79364 +93.71808
1 5000 7.5174274 -1.896575 136.77924 +91.74852
1 10000 15.207217 -0.7758484 144.08014 +88.06828
1 50000 78.463455 +6.374161 228.28516 +10.985855
1 100000 147.36679 +3.8102875 420.82773 +114.49515
10 100 2.6034832 -2.9828963 129.60587 +95.84617
10 500 8.344985 -3.5690222 136.75606 +94.16783
10 1000 16.759014 -2.3448353 145.37177 +89.99339
10 5000 82.11762 +5.2439117 206.09819 +28.079742
10 10000 153.5577 -2.5552368 369.03366 +36.80481

defer d.Unlock()
defer func() { d.lastCompaction = time.Now() }()

// So how does this work? The goal here is to remove all flushed entries from disk,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😮 Nice

Copy link
Member

@djaglowski djaglowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great to me. Nice job on the detailed design. Awesome test coverage.

@camdencheek camdencheek merged commit d67cf06 into master Sep 9, 2020
@camdencheek camdencheek deleted the disk-buffer branch September 9, 2020 13:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants